Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering
نویسندگان
چکیده
The Nyström sampling provides an efficient approach for large scale clustering problems, by generating a low-rank matrix approximation. However, existing sampling methods are limited by their accuracies and computing times. This paper proposes a scalable Nyström-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS). Here we provide a theoretical analysis of the upper error bound of our algorithm, and demonstrate its performance in comparison to the leading spectral clustering methods that use Nyström sampling.
منابع مشابه
Nyström Sampling Depends on the Eigenspec- Trum Shape of the Data
Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nyström method – an approach with proven approximate error bounds. There are several algorithms that provide recipes ...
متن کاملAn Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering
Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.
متن کاملImprovement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra
Hyperspectral image containing high spectral information has a large number of narrow spectral bands over a continuous spectral range. This allows the identification and recognition of materials and objects based on the comparison of the spectral reflectance of each of them in different wavelengths. Hence, hyperspectral image in the generation of land cover maps can be very efficient. In the hy...
متن کاملFast Algorithms for Unsupervised Learning in Large Data Sets
The ability to mine and extract useful information automatically, from large datasets, is a common concern for organizations (having large datasets), over the last few decades. Over the internet, data is vastly increasing gradually and consequently the capacity to collect and store very large data is significantly increasing. Existing clustering algorithms are not always efficient and accurate ...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کامل